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Abstract 

We prove that for any Monte Carlo algorithm of Metropolis type, the au- 
tocorrelation time of a suitable "energy" -like observable is bounded below by 
a multiple of the corresponding "specific heat". This bound does not depend 
on whether the proposed moves are local or non-local; it depends only on the 
distance between the desired probability distribution vr and the probability dis- 
tribution TT^^) for which the proposal matrix satisfies detailed balance. We show, 
with several examples, that this result is particularly powerful when applied to 
non-local algorithms. 
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Forty years ago, Metropolis et al. [jT]] introduced a general method for constructing 
dynamic Monte Carlo algorithms (= Markov chains 0) that satisfy detailed balance 
for a specified probability distribution tt. In this note we would like to point out a 
general limitation on all algorithms of Metropolis type. We prove that the autocor- 
relation time of a suitable "energy" -like observable is bounded below by a multiple 
of the corresponding "specific heat". This bound does not depend on whether the 
proposed moves are local or non-local; it depends only on the distance between the 
desired probability distribution vr and the probability distribution vr*^"^ for which the 
proposal matrix satisfies detailed balance. 

Let us begin by recalling the general Metropolis et al. method, as slightly 
generahzed by Hastings 0. We use the notation of a discrete (finite or countably 
infinite) state space S", but the same considerations apply with minor modifications to 
a general measurable state space. Let P*^''-* = {p'f^} be an arbitrary transition matrix 
on S. We call P^"-* the proposal matrix., and use it to generate proposed moves x ^ y 
that will then be accepted or rejected with probabilities axy and 1 — a^y, respectively. 
If a proposed move is rejected, we make a "null transition" x x. The transition 
matrix P = {pxy} of the full algorithm is thus 

I P^xy <^xy ioT X^y 

where of course we must have < a^y < 1 for all x, y. It is easy to see that P satisfies 
detailed balance for vr if and only if 

^xy _ '^y P^yJ ^2") 



''yx TTj; Pxy 

for all pairs x ^ y. But this is easily arranged: just set 



O-xy ~ -^f ('o) ) ' ("^^ 

\ TCx Pxy 



where F: [0, +oo] [0, 1] is any function satisfying 

F(z) 



for all z. (4) 



Fil/z) 

The choice suggested by Metropolis et al. ^ is 

FMetr{z) = mm{z, 1) . (5) 

Other choices of F are possible, but it is easy to see that they all must satisfy the 
inequality 

F{z) < min(z, 1) . (6) 

Of course, it is still necessary to check that P is irreducible (= ergodic); this is usually 
st raight forward . 
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Note that if the proposal matrix P^") happens to already satisfy detailed balance 
for TT, then we have T^ypf} /i^xpfy = 1, so that a^y = 1 (if we use the Metropolis choice 
of F) and P = P^^^ . On the other hand, no matter what P^^^ is, we obtain a matrix 
P that satisfies detailed balance for vr. So the Metropolis procedure can be thought 
of as a prescription for minimally modifying a given transition matrix P(°) so that it 
satisfies detailed balance for vr. 

Let us now assume that P^^^ satisfies detailed balance for some probability measure 
7r''°-*; in practice this is virtually always the case. We then define an energy-like 
observable H by 

^(^) = |-log(^.M°)) ifvr.>0 

[ +00 if TTx = 

The point is that H is the "energy" of the probability distribution vr relative to 7?*^°^ 
The heart of our argument is the following upper bound on the mean-square 
change in energy in a single step of the Metropolis algorithm: 

Proposition. In the situation described above, we always have 

x,x' 

where 

/+ = E ^^Pfx' < 1 (9) 

H{x') > H{x) 

is the fraction (in equilibrium) of proposals that would strictly increase the energy. 

Proof. Since P satisfies detailed balance for vr, the summand in (|^) is symmetric 
under x ^ x'. Therefore it suffices to consider the terms for which H{x') > H{x), 
and to multiply the result by 2. (The terms having H{x') = H{x) of course make no 
contribution to the sum.) 

If H{x') > H{x), we have a^^/ < e-[^(^')-^(^)l by (D and (|). Therefore 

E 7r,Pxx'[H{x')-H{x)]^ = 'K^pf^,a,,.[H{x')-H{x)f 

H{x') > H{x) H{x') > H(x) 

< E vr.pS,e-[^(^')-^(^)l[i/(x')-if(a:)]^ 

X, x' 
H{x') > H{x) 

< -,U (10) 

since z^e~^ < for all z > 0. | 

The physical intuition behind this proof is simple: Proposed moves having a large 
energy change Aif > have an exponentially small acceptance probability, so the 
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mean-square energy increase {{AH)"^) in a single Metropolis step is at most of order 
1. Proposed moves having a energy change AH < are connected to those with 
AH > by detailed balance: when proposed they are accepted, but if \AH\ is large 
they are only rarely proposed. The result is that the mean-square energy change in 
either direction is at most of order 1. 

Let us now recall the definitions of autocorrelation functions and autocorrelation 
times [0: If A is a real- valued function defined on the state space 5* (i.e. a real- valued 
observable), we define its unnormalized autocorrelation function (in equilibrium) by 

CAAit) = {AsAs+t) - f-il (11a) 

= ^A(x)[7r,(pl*l),j,-7r,7r,]A(y) . (lib) 
x,y 

The corresponding normalized autocorrelation function is 

pAA{t) = CAA{t)/CAA{0) ■ (12) 

The integrated and exponential autocorrelation times are then defined by 

-j^ oo 

Tint,A = ^ $Z PAA{t) (13) 
^ t=-oo 

Texp,A = limsup — p — — (14) 

t^oo - log \pAA{t)\ 

Texp — sup Tf^xp^A (15) 
A 

Some simple identities are worth noting: 

Caa{0) = {A')^ - {A)l (16a) 

Caa{1) = CAAiO) -^Y.^-P--'[^(^')-M^)]' (16b) 

Also, from detailed balance combined with the spectral theorem one can deduce the 
following inequalities: 

1 1 + Paa{1) 

Texp > T^xp,A > -1/log |p^a(1)| (18) 

(see e.g. [^, Appendix A]). 

With these preliminaries, the following theorem is an immediate consequence of 
the Proposition: 

Theorem. Under the preceding hypotheses, we have 

vai(H) 1 

r^nt,H > J— ^ - 2 (^9^) 
Texp > -l/log(l-4/+/e\ar(i/)) (19b) 



4 



where var{H) = (H^)^ - {H)^. 



Proof. From the Proposition together with (16), we get 



Now use (M) and (IT 



p«h(i) - ^ > 1 - ^^TTT PO) 



Again the physical intuition is simple: The mean-square energy change per Me- 
tropolis step is at most of order 1. On the other hand, in order to sample adequately 
the probability distribution vr, the Markov chain must traverse an energy distribution 
of width ~ var(iJ)-'^/^. This takes a time of order (var(if)-'^/^)^ ~ vai{H). 

Example 1. Single-site Metropolis algorithm. Here vr*^") is the a priori measure 
for the spins, and H is the full Hamiltonian. P^^^ selects a spin at random and 
proposes to update it in some way that satisfies detailed balance for 7r^^\ We have 
vai{H) = VCh, where V is the volume and Ch is the specific heat. So the Theorem 
shows that 

VCh , (21) 

where time is here measured in hits of a single site; or equivalently t ^ Ch when 
time is measured in "sweeps". This is a well-known result. However, it is a rather 
poor bound because the energy, being a s/iort-distance observable, has a rather weak 
overlap with the slowest (/ong'-wavelength) modes of this local dynamics. (A much 
stronger bound can be obtained by using the magnetization Ai rather than the energy 
as the trial function: one gets Tint,My 'Texp,M ~ where x is the susceptibility P, §].) 

The real power of the Theorem comes when it is applied to non-local algorithms: 
it still yields r > VC^^ but now the unit of time (a "hit" of P*^"^) is a non-local move 
which costs a CPU time ^1. As a result, several algorithms which a priori look 
promising must in fact perform rather poorly: 

Example 2. q-state Potts model with mixed ferromagnetic/ antiferromagnetic 
interaction 0. The purely ferromagnetic Potts model can be simulated very effi- 
ciently by the Swendsen-Wang (SW) algorithm Q or its single-cluster (ICSW) 
variant [jlO|, but these algorithms do not extend easily to the mixed ferromag- 
netic/antiferromagnetic case. One might therefore try using the SW or ICSW algo- 
rithm for the ferromagnetic part of the Hamiltonian as a Metropolis proposal for the 
full theory. Thus, let Tr'-"-* (resp. tt) be the Gibbs measure for the ferromagnetic (resp. 
full) theory, so that H is the antiferromagnetic part of the Hamiltonian. Let P'-'^-* be 
any algorithm that satisfies detailed balance for tt'^^^ (for example, SW or ICSW); 
and let P be the corresponding Metropolis algorithm for tt. One expects var(if) to 
behave near criticality as ~ J^jVCh, where J^/ is the antiferromagnetic coupling. So 
the Theorem shows that 

JifVCh , (22) 
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where time is here measured in hits of P^^\ For SW (resp. ICSW), each hit takes 
a CPU time of order V (resp. x)- So the proposed algorithm must perform quite 
poorly, except when J^/ is very small . 



Example 3. d = 3 Heisenberg model with topological term [|T^]. The ferro- 
magnetic Heisenberg model can be simulated very efficiently by the Wolff embedding 
algorithm M using either SW or ICSW moves to update the induced Ising model 



T5| . The topological term seems difficult to incorporate into the cluster-algorithm 
framework, but one might try using the SW or ICSW algorithm for the ferromag- 
netic two-body part of the Hamiltonian as a Metropolis proposal for the full theory. 
(The intuitive idea is that a ICSW move is likely to make a modest change in the 
topological-charge field, so the acceptance rate should be reasonable.) Thus, let vr^^^ 
(resp. tt) be the Gibbs measure for the ferromagnetic (resp. full) theory, so that H is 
the topological term. Let P^^^ be any algorithm that satisfies detailed balance for vr*^'^'' 
(for example, SW or ICSW); and let P be the corresponding Metropolis algorithm 
for 71. One expects var(if) to behave near criticality as ~ J^^pVCh, where Jtop is the 
topological coupling and it is known that Ch — > const > at criticality (since 
a < 0). So the Theorem shows that 

JtopV , (23) 

where time is here measured in hits of P^^\ For SW (resp. ICSW), each hit takes 
a CPU time of order V (resp. x)- So the proposed algorithm must perform quite 
poorly, except when Jtop is very small. 

Example 4. Self- avoiding walk with nearest-neighbor interaction. Fix an integer 
A^, and let S be the space of all A^-step self-avoiding walks on some specified lattice. 
Let TT^^^ be the probability measure that gives equal weight to each element of S. 
Then define the probability measure vr by 

vr^ = Z(e)-^ 6-^(^)4°), (24) 

where M{uj) is the number of non-bonded nearest-neighbor contacts in the walk u. Let 
P^^^ be any algorithm that satisfies detailed balance for tc^^^ (e.g. the pivot algorithm 
||T7| , |18|); and let P be the corresponding Metropolis algorithm for (p^. Then the 
Theorem shows that 

Tint,M^'^exp,M k, ehaT^{M) / f , (25) 

where / is the fraction of proposals with uj' ^ u (e.g. the fraction of proposed 
pivot moves that preserve self-avoidance). And we expect var^(M) ^ NC{e), where 
the "specific heat per step" C(e) is everywhere nonzero and diverges like (e — e^)""" 
at the theta (tricritical) point. 

For the pivot algorithm, the bound (P3| ) is a rather weak result: in fact we expect 
that Tint^Mi Texp,Ai ~ N/ f cvcu for 6 = 0, bccausc M is a "primarily local" observable 
18| . But (p5D does show that for e 7^ (and in particular for e ee) the difficul- 
ties cannot be avoided by using a different proposal P^^^] they are inherent in the 
Metropolis method with this choice of 71^^^ |T^ . 
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We conclude by noting that the Metropohs et al. method is often apphed indi- 
rectly: we define transition matrices Pi, . . . ,P„ by the Metropolis method, and we 
then execute either P = Yl'i=i '^i^i some weights A, > ( "random updating" ) or 
else P = Pi - ■ ■ Pn ("sequential updating"). The first case can easily be handled by 
our method. The second case is more subtle, because typically P does not satisfy 
detailed balance but the bound is almost certainly correct in order of magnitude. 



except in special situations like "successive overrelaxation" 
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